by Peter de Blanc + ChatGPT Deep Research
Posted to Adarie (www.adarie.com) on July 22, 2025
Content License: Creative Commons CC0 (No Rights Reserved)
Generating step charts (beatmaps) from audio using deep learning has seen active research and development across various rhythm games. Recent methods typically use neural networks to predict timing (when notes occur) and pattern selection (which actions or positions) for games like DDR/StepMania, Osu!, Beat Saber, and others. Crucially, many systems incorporate difficulty modeling – allowing generation of easier or harder charts – and have been evaluated in game-like settings for playability. Below is a structured survey of notable deep learning–based approaches, organized by game type, with their key features, targeted games, ML techniques, and notes on difficulty and output quality.
Dance Dance Convolution (2017) – One of the first deep learning models for Dance Dance Revolution (DDR) charts. It uses a two-stage neural network: a CNN+RNN (LSTM) to predict step timings from audio spectrograms, and a conditional LSTM to decide which arrow to press at each timing. The model is explicitly conditioned on chart difficulty so it can generate charts at different difficulty levels. The authors trained on thousands of human-made StepMania charts and even released a StepMania demo where players uploaded songs and selected a difficulty; the system would produce a playable chart in seconds. In preliminary user tests, generated charts were playable and reasonably enjoyable (average satisfaction ~3.87/5). Notably, DDC proved that neural networks can learn both the rhythmic structure and the choreography patterns of DDR, producing results comparable to human-made charts in terms of playability. It performed best on higher difficulties (where more training data was available) and can create multiple distinct charts per song, addressing the lack of a single “ground truth” chart. This work established a baseline for “learning to choreograph” in rhythm games.
Udo et al. (2020–2023, Hokkaido Univ.) – A series of works focusing on difficulty-controlled chart generation for DDR/StepMania. Their system first uses a deep neural network (trained on StepMania data) to detect musically salient instants (onsets) and produce an initial step chart, given an audio clip and a desired difficulty level. They then apply a refinement filter that removes or prunes steps to match the target density/difficulty. Essentially, the DNN predicts a high-density chart (often too difficult) and the filter uses a reference targets-per-measure (TPM) profile to sparsify the chart for lower difficulties. This approach ensures the number of arrows (note density) aligns with the chosen difficulty without simply scaling uniformly – it preserves rhythmically important notes while dropping others. Udo et al. reported that their difficulty-controlled charts better match intended difficulty levels (including generating easier charts with “sparse target density” that still follow the music). This research, culminating in a 2023 journal paper, demonstrates a near-production technique for automatically generating multi-level DDR charts, with an emphasis on accurate difficulty tuning.
Beat Sage (2020) – A widely used AI web tool for automatically generating Beat Saber maps from any song. Beat Sage’s pipeline uses two neural networks trained on thousands of human-made Beat Saber maps. The first network analyzes the audio and predicts when to place blocks (note onsets), and the second network assigns a block type and direction to each timing (e.g. left-hand up-swing, right-hand down-swing, or simultaneous etc.). The system supports multiple difficulty levels – at launch it could generate maps in Normal, Hard, Expert, and Expert+ difficulties, mimicking the style of official Beat Saber tracks. In practice, users can select a difficulty and the AI produces a chart at that level. Beat Sage is considered close to production quality: reviewers noted it generates shockingly good results for many songs, often comparable in consistency and fun to community-made maps. The maps tend to match the beat and flow of the music well (especially for electronic or strongly rhythmic songs), though very complex or slow songs remain challenging. The developers (C. Donahue and A. Agarwal) continue to refine the model to avoid rare mapping issues (e.g. awkward patterns or vision blockers). Beat Sage demonstrates a practical, high-quality application of deep learning in a commercial-style rhythm game, with user-tunable difficulty and broad usage (it has been used to generate tens of thousands of custom Beat Saber levels).
DeepSaber (2020) – A research project (Master’s thesis) that built upon DDC to generate Beat Saber charts. DeepSaber implemented a multi-component deep learning approach to handle the high-dimensional choreography of VR rhythm games (which involve placement in 3D space and hand-specific notes). It introduced the idea that “beat maps are sentences; actions are words”, using NLP-inspired techniques. For example, the author created action embeddings (using Word2Vec/FastText) for Beat Saber blocks to encode their similarity/semantics. The model itself was a multi-stream LSTM architecture that ingests multiple inputs – audio features (MFCC), beat information, and even partial chart context (including difficulty as a feature) – to predict sequences of block placements. DeepSaber also proposed new evaluation metrics: a local metric based on the action embeddings (to see if generated patterns locally resemble known good patterns) and a global novelty/diversity metric to compare distribution of generated patterns to human ones. In comparisons, DeepSaber’s results were measured against other AI mappers (including an Oxford VR lab model and Beat Sage) on Expert-level songs. While primarily a research exploration, it showed that advanced deep learning (with embedding techniques and multi-input LSTMs) can generate reasonably playable Beat Saber maps. Some informal feedback noted DeepSaber produced more predictable patterns at lower difficulties, though other tools excelled more at higher complexity, suggesting potential to specialize models per difficulty. Overall, DeepSaber contributed novel ideas (pattern embeddings, multi-input networks) to improve AI choreography for VR rhythm games.
Sypteras “AIsu” (2018) – An early open-source project by Nick Sypteras that generates Osu! standard mode beatmaps (clickable circles) from audio. This system uses deep learning to decide when to place a hit object, and some procedural logic for where to place it on the screen. Specifically, Sypteras trained a CNN-based model on hundreds of high-quality Osu! beatmaps, feeding in mel-scaled spectrogram windows to classify each time frame into one of four classes: no note, note in medium difficulty only, note in hard difficulty only, or notes in both. By training on song segments that had paired medium and hard charts, the model effectively learned to generate two difficulty levels simultaneously. The output “hit vectors” for medium and hard were then decoded into two separate beatmaps. To position the circles, AIsu did not use a fully learned approach; instead, it employed a Markov chain and heuristic rules to arrange notes in patterns, aiming to mimic the flow of human-designed patterns. (For example, it would choose note coordinates based on a learned distribution of angles and distances between consecutive notes.) The resulting maps were not perfect, but they were playable and demonstrated the feasibility of end-to-end generation of Osu! levels. The project’s web demo allowed users to input a song and get an Osu! beatmap with two difficulties. This work is notable for explicitly modeling difficulty (medium vs hard) and for being one of the first community-driven deep learning beatmap generators, paving the way for more advanced models.
BeatLearning (sedthh, 2024) – A recent open-source initiative that leverages state-of-the-art generative AI (transformers) to create beatmaps for various rhythm games, with initial focus on Osu! standard mode. The system converts beatmaps into a tokenized sequence format called BEaRT (each 100ms slice of time is a token encoding up to two note events). Paired with each token sequence are corresponding audio features (Mel spectrogram slices). BeatLearning uses a transformer-based model (inspired by BERT and GPT) that is trained to predict the next token given the past tokens and some “future” audio context (a masked-language-modelling style training with an encoder-decoder mix). Essentially, it learns to generate the sequence of hit objects conditioned on the music. This model is designed to be flexible: it can in principle support different game formats (1, 2, or 4 track games are mentioned) and different difficulty levels. In fact, the beta release includes a front-end where a user can upload a song and select a desired difficulty, and the AI will produce a beatmap at that level. Early examples (medium and hard Osu! maps) show promising results, though some manual cleanup may still be needed for polish. BeatLearning is still a work in progress, but it aims to be a general foundation model for rhythm game chart generation. Its use of modern transformer techniques and a tokenizer approach is pushing the field toward more scalable and potentially more generalizable chart generators that could handle multiple games and tunable difficulty. (Notably, the project draws inspiration from Sypteras’s earlier “AIosu” as well as OpenAI’s sequence modeling techniques.)
The table below summarizes the above projects, highlighting the game targets, AI approaches, difficulty handling, and notable results:
Project (Year) | Target Game(s) | Approach (Models) | Difficulty Support | Output Quality / Notes |
---|---|---|---|---|
Dance Dance Convolution (2017) | DDR (StepMania) 4-panel | CNN + LSTM for timing; conditional LSTM for steps (two-stage) | Yes – conditioned on difficulty input | Playable DDR charts; user demo with ~3.87/5 satisfaction. First deep learning DDR chart generator, baseline for later work. |
Udo et al. (2020–2023) | DDR (StepMania) 4-panel | CNN/LSTM onset detector + rule-based refinement filter | Yes – generates then prunes to target density | Multi-level charts with accurate note density for each level. Used reference TPM (notes/min) to match intended difficulty. |
Liang et al. (2019) | Osu!mania 4-key | BLSTM (C-BLSTM) sequence model, “fuzzy” labels for ambiguity | Yes – difficulty treated as input feature | Improved F-score (0.84) for timing; charts felt more natural than prior work. Focus on supervised PCG for 4-key mode. |
Beat Sage (2020) | Beat Saber (VR) | 2x Neural Nets (CNN/LSTM-style) – one for timing, one for block placement | Yes – supports Normal through Expert+ difficulties | Production-quality auto-mapper; generated maps often rival community maps in fun for suitable songs. Widely used via web. |
DeepSaber (2020) | Beat Saber (VR) | Multi-input LSTM; action-embedding + MLSTM architecture | Yes – difficulty and other features included | Research project (thesis). Introduced action “word” embeddings and novel metrics. Showed feasibility of ML for complex VR patterns. |
Sypteras “AIsu” (2018) | Osu! standard (click circle) | CNN classifier for hits + heuristic placement (Markov chain) | Yes – outputs two fixed difficulties (med & hard) simultaneously | First community DL mapper for Osu!. Web demo generated playable maps (required some manual tweaking). Established deep learning baseline for Osu!. |
BeatLearning (2024) | Osu! standard (expanding to others) | Transformer-based sequence generative model (BERT/GPT hybrid) | Yes – user selects difficulty; model conditioned on it | Ongoing open-source project. Early results show promising beatmaps (without sliders yet). Aims to be general foundation model for rhythm games. |
TaikoNation (2021) | Taiko (2-pad drum) | LSTM RNN focused on pattern sequence generation | Partial – focus on pattern quality, difficulty not main focus | Produced more human-like note patterns than prior ML approaches. Emphasized congruent patterns (key for playability in Taiko). |
GenéLive (2023) | Love Live! & similar (mobile, multi-track) | CNN + RNN (onset & sym modules) with beat guide and multi-scale CNN enhancements | Yes – handles all in-game difficulty modes | Deployed in production (KLab). Halved chart design time, charts meet commercial quality. Open-sourced model used in a live game. |
Sources: The information and outcomes above are drawn from academic papers, project reports, and developer discussions for each system (see citations). Each represents a milestone toward automating rhythm game content creation with deep learning, balancing musical alignment, difficulty, and fun.
From 2017’s pioneering Dance Dance Convolution to recent industry models like GenéLive! (2023), deep learning methods have rapidly advanced in generating beat game charts from audio. These systems commonly break the problem into predicting when notes should occur (like a musical onset task) and what actions or patterns to perform (a sequence generation task akin to language modeling). They often incorporate difficulty as a parameter – either by conditioning the model on difficulty or by post-processing the output – so that the generated charts can cater to various skill levels. Significantly, evaluations show that modern AI-generated charts can approach human-made quality: e.g. players were sometimes challenged to distinguish AI maps in terms of playability, and AI assistance is already speeding up professional chart design. While not every generated level is perfect, the trend is clear: leveraging CNNs, RNNs, and transformers (even diffusion models in latest attempts) has made automatic chart generation a practical reality. Ongoing work is improving musical structure awareness, pattern naturalness, and cross-game generalization, moving these tools ever closer to production-ready content creation across a variety of rhythm games. The convergence of academic research, open-source community projects, and commercial adoption suggests a bright future for AI-driven beatmap generation, where players can enjoy “infinite” new levels for their favorite songs at the click of a button.